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PATENT 

Attorney Docket No.: 19496-003020US 



ZINC FINGER PROTEIN COMPOSITIONS 

CROSS-REFERENCES TO RELATED APPLICATIONS 
The present application claims priority to U.S. pro visional applications 
60/126,238, filed March 24, 1999, 30/126,239 filed March 24, 1999, 60/146,596 filed 
July 30, 1999 and 60/146,615 filed JUdy 30, 1999, all of which are incorporated by 
reference in their entirety for all purposes. 

BACKGROUND 

Zinc finger proteins (ZFPs) are proteins that can bind to DNA in a 
sequence-specific manner. Zinc fingers were first identified in the transcription factor 
TFIIIA from the oocytes of the African clawed toad, Xenopus laevis. An exemplary 
motif characterizing one class of these protein (C 2 H 2 class) is -Cys-(X) 2 _ 4 -Cys-(X)i 2 -His- 
(X) 3 _ 5 -His (where X is any amino acid) (SEQ. ID. No:l). A single finger domain is about 
30 amino acids in length, and several structural studies have demonstrated that it contains 
an alpha helix containing the two invariant histidine residues and two invariant cysteine 
residues in a beta turn co-ordinated through zinc. To date, over 10,000 zinc finger 
sequences have been identified in several thousand known or putative transcription 
factors. Zinc finger domains are involved not only in DNA-recognition, but also in RNA 
binding and in protein-protein binding. Current estimates are that this class of molecules 
will constitute about 2% of all human genes. 

The x-ray crystal structure of Zif268, a three- finger domain from a murine 
transcription factor, has been solved in complex with a cognate DNA-sequence and 
shows that each finger can be superimposed on the next by a periodic rotation. The 
structure suggests that each finger interacts independently with DNA over 3 base-pair 
intervals, with side-chains at positions -1,2,3 and 6 on each recognition helix making 
contacts with their respective DNA triplet subsites. The amino terminus of Zif268 is 
situated at the 3' end of the DNA strand with which it makes most contacts. Some zinc 
fingers can bind to a fourth base in a target segment. If the strand with which a zinc 
finger protein makes most contacts is designated the target strand, some zinc finger 




proteins bind to a three base triplet in the target strand and a fourth base on the nontarget 
strand. The fourth base is complementary to the base immediately 3 ' of the three base 
subsite. 

The structure of the Zif268-DNA complex also suggested that the DNA 
5 sequence specificity of a zinc finger protein might be altered by making amino acid 
substitutions at the four helix positions (-1, 2, 3 and 6) on each of the zinc finger 
recognition helices. Phage display experiments using zinc finger combinatorial libraries 
to test this observation were published in a series of papers in 1994 (Rebar et al., Science 
263, 671-673 (1994); Jamieson et al., Biochemistry 33, 5689-5695 (1994); Choo et al, 

10 PNAS 91, 1 1 163-1 1 167 (1994)). Combinatorial libraries were constructed with 

randomized side-chains in either the first or middle finger of Zif268 and then used to 
select for an altered Zif268 binding site in which the appropriate DNA sub-site was 
replaced by an altered DNA triplet. Further, correlation between the nature of introduced 
mutations and the resulting alteration in binding specificity gave rise to a partial set of 

1 5 substitution rules for design of ZFPs with altered binding specificity. 

Greisman & Pabo, Science 275, 657-661 (1997) discuss an elaboration of 
the phage display method in which each finger of a Zif268 was successively randomized 
and selected for binding to a new triplet sequence. This paper reported selection of ZFPs 
for a nuclear hormone response element, a p53 target site and a TATA box sequence. 

20 A number of papers have reported attempts to produce ZFPs to modulate 

particular target sites. For example, Choo et al., Nature 372, 645 (1994), report an 
attempt to design a ZFP that would repress expression of a brc-abl oncogene. The target 
segment to which the ZFPs would bind was a nine base sequence 5'GCA GAA GCC3' 
chosen to overlap the junction created by a specific oncogenic translocation fusing the 

25 genes encoding brc and abl. The intention was that a ZFP specific to this target site 

would bind to the oncogene without binding to abl or brc component genes. The authors 
used phage display to screen a mini-library of variant ZFPs for binding to this target 
segment. A variant ZFP thus isolated was then reported to repress expression of a stably 
transfected brc-able construct in a cell line. 

30 Pomerantz et al., Science 267, 93-96 (1995) reported an attempt to design 

a novel DNA binding protein by fusing two fingers from Zif268 with a homeodomain 
from Oct-1. The hybrid protein was then fused with a transcriptional activator for 
expression as a chimeric protein. The chimeric protein was reported to bind a target site 



representing a hybrid of the subsites of its two components. The authors then constructed 
a reporter vector containing a luciferase gene operably linked to a promoter and a hybrid 
site for the chimeric DNA binding protein in proximity to the promoter. The authors 
reported that their chimeric DNA binding protein could activate expression of the 
luciferase gene. 

Liu et al., PNAS 94, 5525-5530 (1997) report forming a composite zinc 
finger protein by using a peptide spacer to link two component zinc finger proteins each 
having three fingers. The composite protein was then further linked to transcriptional 
activation domain. It was reported that the resulting chimeric protein bound to a target 
site formed from the target segments bound by the two component zinc finger proteins. It 
was further reported that the chimeric zinc finger protein could activate transcription of a 
reporter gene when its target site was inserted into a reporter plasmid in proximity to a 
promoter operably linked to the reporter. 

Choo et al., WO 98/53058, WO98/53059, and WO 98/53060 (1998) 
discuss selection of zinc finger proteins to bind to a target site within the HIV Tat gene. 
Choo et al. also discuss selection of a zinc finger protein to bind to a target site 
encompassing a site of a common mutation in the oncogene ras. The target site within ras 
was thus constrained by the position of the mutation. 

The present application is related to commonly owned copending applications 
09/229,007 filed January 12, 1999 and 09/229,037 filed January 12, 1999. 

SUMMARY OF THE CLAIMED INVENTION 
Tables 1-5 show the amino acid sequences of a large collection of zinc 
finger proteins and corresponding target sites bound by the proteins. Nucleotide 
sequences of target sites are shown in Col. 2. Target sites typically have 9 or 10 bases 
and constitute three target subsites bound by respective zinc finger components of a 
multifinger protein. Amino acid sequences of zinc finger components are shown in cols. 
4, 6 and 8. The amino acids shown occupy positions -1 to +6 of a zinc finger. Table 6 
shows consensus sequences for zinc fingers and target subsites bound by the fingers. Col. 
1 shows the nucleotides occupying a target subsite. Cols. 2-4 show amino acids 
occupying positions -1 to +6 of zinc fingers binding to a target subsite. 

Accordingly, the invention provides zinc fingers having amino acid 
sequences and target subsite binding specificies shown in Table 6. As an example, a zinc 




finger having the amino acid sequence DXSNXXR at positions —1 to +6 has a target 
subsite GAC. As an other example, a zinc finger having the amino acid sequence 
RX(D/S)NXXR at positions -1 to +6 has a target subsite of GAG. A zinc finger having 
an amino acid sequence TXGNXXR at positions -1 to +6 has the target subsite GAT. A 
5 zinc finger having the sequence (Q/T)XSNXXR at positions -1 to +6 binds to a target 
subsite GAT. A zinc finger having an amino acid sequence QXG(S/D)XXR at positions 
-1 to +6 binds to a target subsite GCA. A zinc finger having an amino acid sequence 
RXDEXXR binds to a target subsite GCG. A zinc finger having an amino acid sequence 
QXSDXXR at positions -1 to +6 binds to a target subsite GCT. A zinc finger having an 

10 amino acid sequence QX(G/A)HXXR at positions -1 to +6 binds to a target subsite GGA. 
A zinc finger having an amino acid sequence DXSHXXR binds to a target subsite GGC. 
A zinc finger having an amino acid sequence RXDHXXR at positions -1 to +6 binds to a 
target substite GGG. A zinc finger having an amino acid sequence RXDAXXR at 
positions -1 to +6 binds to a target subsite GTG. 

15 The invention further provides nucleic acid encoding zinc fingers, 

including all of the zinc fingers described above. 

The invention further provides segments of a zinc finger 
comprising a sequence of seven contiguous amino acids as shown in any of Tables 1-5. 
The invention also provides nucleic acids encoding any of these segments and zinc 

20 fingers comprising the same. 

The invention further provides zinc finger proteins comprising 
first, second and third zinc fingers. The first, second and third zinc fingers comprise 
respectively first, second and third segments of seven contiguous amino acids as shown in 
a row of Tables 1-5. The invention further provides nucleic acids encoding such zinc 

25 finger proteins. 

BRIEF DESCRIPTION OF THE FIGURE 
Fig. 1 shows assembly of nucleic acids encoding zinc finger binding 

proteins. 

DEFINITIONS 

30 A zinc finger DNA binding protein is a protein or segment within a larger 

protein that binds DNA in a sequence-specific manner as a result of stabilization of 
protein structure through cordination of a zinc ion. The term zinc finger DNA binding 
protein is often abbreviated as zinc finger protein or ZFP. 



A designed zinc finger protein is a protein not occurring in nature whose 
design/composition results principally from rational criteria. Rational criteria for design 
include application of substitution rules and computerized algorithms for processing 
information in a database storing information of existing ZFP designs and binding data. . 

A selected zinc finger protein is a protein not found in nature whose 
production results primarily from an empirical process such as phage display. 

The term naturally-occurring is used to describe an object that can be 
found in nature as distinct from being artificially produced by man. For example, a 
polypeptide or polynucleotide sequence that is present in an organism (including viruses) 
that can be isolated from a source in nature and which has not been intentionally modified 
by man in the laboratory is naturally-occurring. Generally, the term naturally-occurring 
refers to an object as present in a non-pathological (undiseased) individual, such as would 
be typical for the species. 

A nucleic acid is operably linked when it is placed into a functional 
relationship with another nucleic acid sequence. For instance, a promoter or enhancer is 
operably linked to a coding sequence if it increases the transcription of the coding 
sequence. Operably linked means that the DNA sequences being linked are typically 
contiguous and, where necessary to join two protein coding regions, contiguous and in 
reading frame. However, since enhancers generally function when separated from the 
promoter by up to several kilobases or more and intronic sequences may be of variable 
lengths, some polynucleotide elements may be operably linked but not contiguous. 

A specific binding affinity between, for example, a ZFP and a specific 
target site means a binding affinity of at least 1 x 10 6 M" 1 . 

The terms "modulating expression" "inhibiting expression" and "activating 
expression" of a gene refer to the ability of a zinc finger protein to activate or inhibit 
transcription of a gene. Activation includes prevention of subsequent transcriptional 
inhibition (i.e., prevention of repression of gene expression) and inhibition includes 
prevention of subsequent transcriptional activation (i.e., prevention of gene activation). 
Modulation can be assayed by determining any parameter that is indirectly or directly 
affected by the expression of the target gene. Such parameters include, e.g., changes in 
RNA or protein levels, changes in protein activity, changes in product levels, changes in 
downstream gene expression, changes in reporter gene transcription (luciferase, CAT, 
beta-galactosidase, GFP (see, e.g., Mistili & Spector, Nature Biotechnology 15:961-964 




(1997)); changes in signal transduction, phosphorylation and dephosphorylation, receptor- 
ligand interactions, second messenger concentrations (e.g., cGMP, cAMP, IP3, and 
Ca2+), cell growth, neovascularization, in vitro, in vivo, and ex vivo. Such functional 
effects can be measured by any means known to those skilled in the art, e.g., 
5 measurement of RNA or protein levels, measurement of RNA stability, identification of 
downstream or reporter gene expression, e.g., via chemiluminescence, fluorescence, 
colorimetric reactions, antibody binding, inducible markers, ligand binding assays; 
changes in intracellular second messengers such as cGMP and inositol triphosphate (IP3); 
changes in intracellular calcium levels; cytokine release, and the like. 
10 A "regulatory domain" refers to a protein or a protein subsequence that has 

transcriptional modulation activity. Typically, a regulatory domain is covalently or non- 
covalently linked to a ZFP to modulate transcription. Alternatively, a ZFP can act alone, 
without a regulatory domain, or with multiple regulatory domains to modulate 
transcription. 

15 A D-able subsite within a target site has the motif 5'NNGK3\ A target 

site containing one or more such motifs is sometimes described as a D-able target site. A 
zinc finger appropriately designed to bind to a D-able subsite is sometimes referred to as 
a D-able finger. Likewise a zinc finger protein containing at least one finger designed or 
selected to bind to a target site including at least one D-able subsite is sometimes referred 

20 to as a D-able zinc finger protein. 

DETAILED DESCRIPTION 

L General 

Tables 1-5 list a collection of nonnaturally occurring zinc finger protein 
25 sequences and their corresponding target sites. The first column of each table is an 

internal reference number. The second column lists a 9 or 10 base target site bound by a 
three-finger zinc finger protein, with the target sites listed in 5' to 3 5 orientation. The 
third column provides SEQ ID NOs for the target site sequences listed in column 2. The 
fourth, sixth and eighth columns list amino acid residues from the first, second and third 
30 fingers, respectively, of a zinc finger protein which recognizes the target sequence listed 
in the second column. For each finger, seven amino acids, occupying positions -1 to +6 
of the finger, are listed. The numbering convention for zinc fingers is defined below. 
Columns 5, 7 and 9 provide SEQ ID NOs for the amino acid sequences listed in columns 



4, 6 and 8, respectively. The final column of each table lists the binding affinity (i.e., the 
K<i in nM) of the zinc finger protein for its target site. Binding affinities are measured as 
described below. 

Each finger binds to a triplet of bases within a corresponding target 
sequence. The first finger binds to the first triplet starting from the 3' end of a target site, 
the second finger binds to the second triplet, and the third finger binds the third (i.e., the 
S'-most) triplet of the target sequence. For example, the RSDSLTS finger (SEQ ID 
NO: 646) of SBS# 201 (Table 2) binds to 5'TTG3\ the ERSTLTR finger (SEQ ID 
NO: 851) binds to5'GCC3' and the QRADLRR finger (SEQ ID NO: 1056) binds to 
5'GCA3\ 

Table 6 lists a collection of consensus sequences for zinc fingers and the 
target sites bound by such sequences. Conventional one letter amino acid codes are used 
to designate amino acids occupying consensus positions. The symbol "X" designates a 
nonconsensus position that can in principle be occupied by any amino acid. In most zinc 
fingers of the C2H2 type, binding specificity is principally conferred by residues -1, +2, 
+3 and +6. Accordingly, consensus sequence determining binding specificity typically 
include at least these residues. Consensus sequences are useful for designing zinc fingers 
to bind to a given target sequence. Residues occupying other positions can be selected 
based on sequences in Tables 1-5, or other known zinc finger sequences. Alternatively, 
these positions can be randomized with a plurality of candidate amino acids and screened 
against one or more target sequences to refine binding specificity or improve binding 
specificity. In general, the same consensus sequence can be used for design of a zinc 
finger regardless of the relative position of that finger in a multi-finger zinc finger 
protein. For example, the sequence RXDNXXR can be used to design a N-terminal, 
central or C-terminal finger of three finger protein. However, some consensus sequences 
are most suitable for designing a zinc finger to occupy a particular position in a multi- 
finger protein. For example, the consensus sequence RXDHXXQ is most suitable for 
designing a C-terminal finger of a three-finger protein. 

II. Characteristics of Zinc Finger Proteins 

Zinc finger proteins are formed from zinc finger components. For 
example, zinc finger proteins can have one to thirty-seven fingers, commonly having 2, 3, 
4, 5 or 6 fingers. A zinc finger protein recognizes and binds to a target site (sometimes 
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referred to as a target segment) that represents a relatively small subsequence within a 
target gene. Each component finger of a zinc finger protein can bind to a subsite within 
the target site. The subsite includes a triplet of three contiguous bases all on the same 
strand (sometimes referred to as the target strand). The subsite may or may not also 
include a fourth base on the opposite strand that is the complement of the base 
immediately 3' of the three contiguous bases on the target strand. In many zinc finger 
proteins, a zinc finger binds to its triplet subsite substantially independently of other 
fingers in the same zinc finger protein. Accordingly, the binding specificity of zinc 
finger protein containing multiple fingers is usually approximately the aggregate of the 
specificities of its component fingers. For example, if a zinc finger protein is formed 
from first, second and third fingers that individually bind to triplets XXX, YYY, and 
ZZZ, the binding specificity of the zinc finger protein is 3 'XXX YYY ZZZ5\ 

The relative order of fingers in a zinc finger protein from N-terminal to C- 
terminal determines the relative order of triplets in the 3 5 to 5' direction in the target. For 
example, if a zinc finger protein comprises from N-terminal to C-terminal first, second 
and third fingers that individualy bind, respectively, to triplets 5' GAC3', 5'GTA3' and 
5"GGC3' then the zinc finger protein binds to the target segment 3 ' C AGATGCGG5 ' . If 
the zinc finger protein comprises the fingers in another order, for example, second finger, 
first finger, third finger, then the zinc finger protein binds to a target segment comprising 
a different permutation of triplets, in this example, 3 ' ATGC AGCGG5 ' (see Berg & Shi, 
Science 271, 1081-1086 (1996)). The assessment of binding properties of a zinc finger 
protein as the aggregate of its component fingers may, in some cases, be influenced by 
context-dependent interactions of multiple fingers binding in the same protein. 

Two or more zinc finger proteins can be linked to have a target specificity 
that is the aggregate of that of the component zinc finger proteins (see e.g., Kim & Pabo, 
PNAS 95, 2812-2817 (1998)). For example, a first zinc finger protein having first, second 
and third component fingers that respectively bind to XXX, YYY and ZZZ can be linked 
to a second zinc finger protein having first, second and third component fingers with 
binding specificities, AAA, BBB and CCC. The binding specificity of the combined first 
and second proteins is thus 3 'XXXYYYZZZ AAABBBCCC5 ' , where the underline 
indicates a short intervening region (typically 0-5 bases of any type). In this situation, the 
target site can be viewed as comprising two target segments separated by an intervening 
segment. 



Linkage can be accomplished using any of the following peptide linkers. 
T G E K P: (SEQ. ID. No:2) (Liu et al., 1997, supra.); (G4S)n (SEQ. ID. No:3) (Kim et 
al., PNAS 93, 1156-1160 (1996.); GGRRGGGS; (SEQ. ID. No:4) LRQRDGERP; (SEQ. 
ID. No:5) LRQKD GGGS ERP ; (SEQ. ID. No:6) LRQKD(G3S)2 ERP (SEQ. ID. No:7) 
Alternatively, flexible linkers can be rationally designed using computer programs 
capable of modeling both DNA-binding sites and the peptides themselves or by phage 
display methods . In a further variation, noncovalent linkage can be achieved by fusing 
two zinc finger proteins with domains promoting heterodimer formation of the two zinc 
finger proteins. For example, one zinc finger protein can be fused with fos and the other 
withjun (see Barbas et al., WO 95/119431). 

Linkage of two zinc finger proteins is advantageous for conferring a 
unique binding specificity within a mammalian genome. A typical mammalian diploid 
genome consists of 3 x 10 9 bp. Assuming that the four nucleotides A, C, G, and T are 
randomly distributed, a given 9 bp sequence is present -23,000 times. Thus a ZFP 
recognizing a 9 bp target with absolute specificity would have the potential to bind to 
-23,000 sites within the genome. An 18 bp sequence is present once in 3.4 x 10 10 bp, or 
about once in a random DNA sequence whose complexity is ten times that of a 
mammalian genome. 

A component finger of zinc finger protein typically contains about 30 
amino acids and has the following motif (N-C) : 

(SEQ. ID. No:8) 

Cys- (X) 2-4-Cys-X.X.X.X.X.X.X.X.X.X.X.X-His- (X) 3 _ 5 -His 

-11234567 

The two invariant histidine residues and two invariant cysteine residues in 
a single beta turn are co-ordinated through zinc (see, e.g., Berg & Shi, Science 271, 1081- 
1085 (1996)). The above motif shows a numbering convention that is standard in the 
field for the region of a zinc finger conferring binding specificity. The amino acid on the 
left (N-terminal side) of the first invariant His residues is assigned the number +6, and 
other amino acids further to the left are assigned successively decreasing numbers. The 
alpha helix begins at residue 1 and extends to the residue following the second conserved 
histidine. The entire helix is therefore of variable length, between 1 1 and 13 residues. 

The process of designing or selecting a nonnaturally occurring or variant 
ZFP typically starts with a natural ZFP as a source of framework residues. The process of 



10 

design or selection serves to define nonconserved positions (i.e., positions -1 to +6) so as 
to confer a desired binding specificity. One suitable ZFP is the DNA binding domain of 
the mouse transcription factor Zif268. The DNA binding domain of this protein has the 
amino acid sequence: 

YACPVESCDRRFSRSDELTRHIRIHTGQKP (Fl) (SEQ. ID No:9) 
FQCRICMRNFSRSDHLTTHIRTHTGEKP (F2) (SEQ. ID. No: 10) 
FACDICGRKFARSDERKRHTKIHLRQK (F3) SEQ. ID. No:l 1) 
and binds to a target 5' GCG TGG GCG 3' (SEQ ID No: 12). 

Another suitable natural zinc finger protein as a source of framework 
residues is Sp-1. The Sp-1 sequence used for construction of zinc finger proteins 
corresponds to amino acids 531 to 624 in the Sp-1 transcription factor. This sequence is 
94 amino acids in length. The amino acid sequence of Sp-1 is as follows: 
PGKKKQHICHIQGCGKVYGKTSHLRAHLRWHTGERP 
FMCTWSYCGKRFTRSDELQRHKRTHTGEKK 
FACPECPKRFMRSDHLSKHIKTHQNKKG (SEQ. ID. No: 1 3) 
Sp-1 binds to a target site 5'GGG GCG GGG3' (SEQ ID No: 14). 

An alternate form of Sp-1, an Sp-1 consensus sequence, has the following 
amino acid sequence: 
meklrngsgd 

PGKKKQHACPECGKSFSKSSHLRAHQRTHTGERP 
YKCPECGKSFSRSDELQRHQRTHTGEKP 

YKCPECGKSFSRSDHLSKHQRTHQNKKG (SEQ. ID. No: 15) (lower case letters are a 
leader sequence from Shi & Berg, Chemistry and Biology 1, 83-89. (1995). The optimal 
binding sequence for the Sp-1 consensus sequence is 5'GGGGCGGGG3' (SEQ ID No: 
16) . Other suitable ZFPs are described below. 

There are a number of substitution rules that assist rational design of some 
zinc finger proteins (see Desjarlais & Berg, PNAS 90, 2256-2260 (1993); Choo & Klug, 
PNAS 91, 11163-11167 (1994); Desjarlais & Berg, PNAS 89, 7345-7349 (1992); 
Jamieson et al., supra; Choo et al., WO 98/53057, WO 98/53058; WO 98/53059; WO 
98/53060). Many of these rules are supported by site-directed mutagenesis of the three- 
finger domain of the ubiquitous transcription factor, Sp-1 (Desjarlais and Berg, 1992; 
1993). One of these rules is that a 5' G in a DNA triplet can be bound by a zinc finger 
incorporating arginine at position 6 of the recognition helix. Another substitution rule is 




11 

that a G in the middle of a subsite can be recognized by including a histidine residue at 
position 3 of a zinc finger. A further substitution rule is that asparagine can be 
incorporated to recognize A in the middle of triplet, aspartic acid, glutamic acid, serine or 
threonine can be incorporated to recognize C in the middle of triplet, and amino acids 
5 with small side chains such as alanine can be incorporated to recognize T in the middle of 
triplet. A further substitution rule is that the 3 ' base of triplet subsite can be recognized 
by incorporating the following amino acids at position -1 of the recognition helix: 
arginine to recognize G, glutamine to recognize A, glutamic acid (or aspartic acid) to 
recognize C, and threonine to recognize T. Although these substitution rules are useful 

10 in designing zinc finger proteins they do not take into account all possible target sites. 

Furthermore, the assumption underlying the rules, namely that a particular amino acid in 
a zinc finger is responsible for binding to a particular base in a subsite is only 
approximate. Context-dependent interactions between proximate amino acids in a finger 
or binding of multiple amino acids to a single base or vice versa can cause variation of the 

15 binding specificities predicted by the existing substitution rules. 

The technique of phage display provides a largely empirical means of 
generating zinc finger proteins with a desired target specificity (see e.g., Rebar, US 
5,789,538; Choo et al., WO 96/06166; Barbas et al., WO 95/19431 and WO 98/543111; 
Jamieson et aL, supra). The method can be used in conjunction with, or as an alternative 

20 to rational design. The method involves the generation of diverse libraries of 

mutagenized zinc finger proteins, followed by the isolation of proteins with desired DNA- 
binding properties using affinity selection methods. To use this method, the experimenter 
typically proceeds as follows. First, a gene for a zinc finger protein is mutagenized to 
introduce diversity into regions important for binding specificity and/or affinity. In a 

25 typical application, this is accomplished via randomization of a single finger at positions 
-1, +2, +3, and +6, and sometimes accessory positions such as +1, +5, +8 and +10. Next, 
the mutagenized gene is cloned into a phage or phagemid vector as a fusion with gene III 
of a filamentous phage, which encodes the coat protein pill . The zinc finger gene is 
inserted between segments of gene III encoding the membrane export signal peptide and 

30 the remainder of pill , so that the zinc finger protein is expressed as an amino-terminal 
fusion with pill or in the mature, processed protein. When using phagemid vectors, the 
mutagenized zinc finger gene may also be fused to a truncated version of gene III 
encoding, minimally, the C-terminal region required for assembly of pill into the phage 
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particle. The resultant vector library is transformed into E. coli and used to produce 
filamentous phage which express variant zinc finger proteins on their surface as fusions 
with the coat protein pill. If a phagemid vector is used, then the this step requires 
superinfection with helper phage. The phage library is then incubated with target DNA 
5 site, and affinity selection methods are used to isolate phage which bind target with high 
affinity from bulk phage. Typically, the DNA target is immobilized on a solid support, 
which is then washed under conditions sufficient to remove all but the tightest binding 
phage. After washing, any phage remaining on the support are recovered via elution 
under conditions which disrupt zinc finger - DNA binding. Recovered phage are used to 

10 infect fresh E. coli,, which is then amplified and used to produce a new batch of phage 
particles. Selection and amplification are then repeated as many times as is necessary to 
enrich the phage pool for tight binders such that these may be identified using sequencing 
and/or screening methods. Although the method is illustrated for pill fusions, analogous 
principles can be used to screen ZFP variants as pVIII fusions. 

15 In certain embodiments, the sequence bound by a particular zinc finger 

protein is determined by conducting binding reactions (see, e.g., conditions for 
determination of K^, infra) between the protein and a pool of randomized double-stranded 
oligonucleotide sequences. The binding reaction is analyzed by an electrophoretic 
mobility shift assay (EMSA), in which protein-DNA complexes undergo retarded 

20 migration in a gel and can be separated from unbound nucleic acid. Oligonucleotides 
which have bound the finger are purified from the gel and amplified, for example, by a 
polymerase chain reaction. The selection (i.e. binding reaction and EMSA analysis) is 
then repeated as many times as desired, with the selected oligonucleotide sequences. In 
this way, the binding specificity of a zinc finger protein having a particular amino acid 

25 sequence is determined. 

Zinc finger proteins are often expressed with a heterologous domain as 
fusion proteins. Common domains for addition to the ZFP include, e.g., transcription 
factor domains (activators, repressors, co-activators, co-repressors), silencers, oncogenes 
(e.g., myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members etc.); DNA 

30 repair enzymes and their associated factors and modifiers; DNA rearrangement enzymes 
and their associated factors and modifiers; chromatin associated proteins and their 
modifiers (e.g. kinases, acetylases and deacetylases); and DNA modifying enzymes (e.g., 
methyltransferases, topoisomerases, helicases, ligases, kinases, phosphatases, 
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polymerases, endonucleases) and their associated factors and modifiers. A preferred 
domain for fusing with a ZFP when the ZFP is to be used for represssing expression of a 
target gene is a KRAB repression domain from the human KOX-1 protein (Thiesen et al., 
New Biologist 2, 363-374 (1990); Margolin et al., Proc. Natl Acad. Set USA 91, 4509- 
5 4513 (1994); Pengue et al., Nucl Acids Res. 22:2908-2914 (1994); Witzgall et aL, Proc. 
Natl. Acad. Sci. USA 91, 4514-4518 (1994). Preferred domains for achieving activation 
include the HSV VP16 activation domain (see, e.g., Hagmann et al., J. Virol 71, 5952- 
5962 (1997)) nuclear hormone receptors (see, e.g., Torchia et al., Curr. Opin. Cell Biol 
10:373-383 (1998)); the p65 subunit of nuclear factor kappa B (Bitko & Barik, J. Virol 

10 72:5610-5618 (1998)and Doyle & Hunt, Neuroreport 8:2937-2942 (1997)); Liu et al., 
Cancer Gene Ther, 5:3-28 (1998)), or artificial chimeric functional domains such as 
VP64 (Seifpal et al., EMBOJ. 11, 4961-4968 (1992)). 

An important factor in the administration of polypeptide compounds, such 
as the ZFPs, is ensuring that the polypeptide has the ability to traverse the plasma 

1 5 membrane of a cell, or the membrane of an intra-cellular compartment such as the 
nucleus. Cellular membranes are composed of lipid-protein bilayers that are freely 
permeable to small, nonionic lipophilic compounds and are inherently impermeable to 
polar compounds, macromolecules, and therapeutic or diagnostic agents. However, 
proteins and other compounds such as liposomes have been described, which have the 

20 ability to translocate polypeptides such as ZFPs across a cell membrane. 

For example, "membrane translocation polypeptides" have amphiphilic or 
hydrophobic amino acid subsequences that have the ability to act as membrane- 
translocating carriers. In one embodiment, homeodomain proteins have the ability to 
translocate across cell membranes. The shortest internalizable peptide of a homeodomain 

25 protein, Antennapedia, was found to be the third helix of the protein, from amino acid 
position 43 to 58 (see, e.g., Prochiantz, Current Opinion in Neurobiology 6:629-634 
(1996)). Another subsequence, the h (hydrophobic) domain of signal peptides, was found 
to have similar cell membrane translocation characteristics {see, e.g., Lin et al, J. Biol 
Chem. 270:1 4255-14258 (1995)). 

30 Examples of peptide sequences which can be linked to a ZFP of the 

invention, for facilitating uptake of ZFP into cells, include, but are not limited to: an 1 1 
animo acid peptide of the tat protein of HIV; a 20 residue peptide sequence which 
corresponds to amino acids 84-103 of the pi 6 protein (see Fahraeus et al, Current 
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Biology 6:84 (1996)); the third helix of the 60-amino acid long homeodomain of 
Antennapedia (Derossi et al, J. Biol. Chem. 269:10444 (1994)); the h region of a signal 
peptide such as the Kaposi fibroblast growth factor (K-FGF) h region (Lin et al., supra); 
or the VP22 translocation domain from HSV (Elliot & O'Hare, Cell 88:223-233 (1997)). 
5 Other suitable chemical moieties that provide enhanced cellular uptake may also be 
chemically linked to ZFPs. 

Toxin molecules also have the ability to transport polypeptides across cell 
membranes. Often, such molecules are composed of at least two parts (called "binary 
toxins"): a translocation or binding domain or polypeptide and a separate toxin domain or 

1 0 polypeptide. Typically, the translocation domain or polypeptide binds to a cellular 

receptor, and then the toxin is transported into the cell. Several bacterial toxins, including 
Clostridium perfringens iota toxin, diphtheria toxin (DT), Pseudomonas exotoxin A (PE), 
pertussis toxin (PT), Bacillus anthracis toxin, and pertussis adenylate cyclase (CYA), 
have been used in attempts to deliver peptides to the cell cytosol as internal or amino- 

15 terminal fusions (Arora et al., J. Biol. Chem., 268:3334-3341 (1993); Perelle et al, Infect 
Immun. y 61:5147-5156 (1993); Stenmark et al, J. Cell Biol 113:1025-1032 (1991); 
Donnelly et al, PNAS 90:3530-3534 (1993); Carbonetti et al., Abstr. Annu. Meet. Am. 
Soc. Microbiol 95:295 (1995); Sebo et al, Infect Immun. 63:3851-3857 (1995); Klimpel 
etal, PNAS U.S.A. 89:10277-10281 (1992); and Novak etal, J. Biol Chem. 267:17186- 

20 17193 1992)). 

Such subsequences can be used to translocate ZFPs across a cell 
membrane. ZFPs can be conveniently fused to or derivatized with such sequences. 
Typically, the translocation sequence is provided as part of a fusion protein. Optionally, a 
linker can be used to link the ZFP and the translocation sequence. Any suitable linker can 
25 be used, e.g., a peptide linker. 

Production of ZFPs 

ZFP polypeptides and nucleic acids encoding the same can be made using 
routine techniques in the field of recombinant genetics. Basic texts disclosing the general 
30 methods of use in this invention include Sambrook et al., Molecular Cloning, A 
Laboratory Manual (2nd ed. 1989); Kriegler, Gene Transfer and Expression: A 
Laboratory Manual (1990); and Current Protocols in Molecular Biology (Ausubel et al., 
eds., 1994)). In addition, nucleic acids less than about 100 bases can be custom ordered 



# 
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from any of a variety of commercial sources, such as The Midland Certified Reagent 
Company (mcrc@oligos.com), The Great American Gene Company 
(http://www.genco.com), ExpressGen Inc. (www.expressgen.com), Operon Technologies 
Inc. (Alameda, CA). Similarly, peptides can be custom ordered from any of a variety of 
5 sources, such as PeptidoGenic (pkim@ccnet.com), HTI Bio-products, inc. 
(http://www.htibio.com), BMA Biomedicals Ltd (U.K.), Bio. Synthesis, Inc. 

Oligonucleotides can be chemically synthesized according to the solid 
phase phosphoramidite triester method first described by Beaucage & Caruthers, 
Tetrahedron Letts, 22:1859-1862 (1981), using an automated synthesizer, as described in 

10 Van Devanter et al., Nucleic Acids Res. 12:6159-6168 (1984). Purification of 

oligonucleotides is by either denaturing polyacrylamide gel electrophoresis or by reverse 
phase HPLC. The sequence of the cloned genes and synthetic oligonucleotides can be 
verified after cloning using, e.g., the chain termination method for sequencing double- 
stranded templates of Wallace et al., Gene 16:21-26 (1981). 

15 Two alternative methods are typically used to create the coding sequences 

required to express newly designed DNA-binding peptides. One protocol is a PCR-based 
assembly procedure that utilizes six overlapping oligonucleotides (Fig. 1). Three 
oligonucleotides (oligos 1, 3, and 5 in Figure 1) correspond to "universal" sequences that 
encode portions of the DNA-binding domain between the recognition helices. These 

20 oligonucleotides typically remain constant for all zinc finger constructs. The other three 
"specific" oligonucleotides (oligos 2, 4, and 6 in Fig. 1) are designed to encode the 
recognition helices. These oligonucleotides contain substitutions primarily at positions - 
1, 2, 3 and 6 on the recognition helices making them specific for each of the different 
DNA-binding domains. 

25 The PGR synthesis is carried out in two steps. First, a double stranded 

DNA template is created by combining the six oligonucleotides (three universal, three 
specific) in a four cycle PCR reaction with a low temperature annealing step, thereby 
annealing the oligonucleotides to form a DNA "scaffold." The gaps in the scaffold are 
filled in by high-fidelity thermostable polymerase, the combination of Taq and Pfu 

30 polymerases also suffices. In the second phase of construction, the zinc finger template is 
amplified by external primers designed to incorporate restriction sites at either end for 
cloning into a shuttle vector or directly into an expression vector. 
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An alternative method of cloning the newly designed DNA-binding 
proteins relies on annealing complementary oligonucleotides encoding the specific 
regions of the desired ZFP. This particular application requires that the oligonucleotides 
be phosphorylated prior to the final ligation step. This is usually performed before setting 
up the annealing reactions. In brief, the "universal" oligonucleotides encoding the 
constant regions of the proteins (oligos 1,2 and 3 of above) are annealed with their 
complementary oligonucleotides. Additionally, the "specific" oligonucleotides encoding 
the finger recognition helices are annealed with their respective complementary 
oligonucleotides. These complementary oligos are designed to fill in the region which 
was previously filled in by polymerase in the above-mentioned protocol. The 
complementary oligos to the common oligos 1 and finger 3 are engineered to leave 
overhanging sequences specific for the restriction sites used in cloning into the vector of 
choice in the following step. The second assembly protocol differs from the initial 
protocol in the following aspects: the "scaffold" encoding the newly designed ZFP is 
composed entirely of synthetic DNA thereby eliminating the polymerase fill-in step, 
additionally the fragment to be cloned into the vector does not require amplification. 
Lastly, the design of leaving sequence-specific overhangs eliminates the need for 
restriction enzyme digests of the inserting fragment. Alternatively, changes to ZFP 
recognition helices can be created using conventional site-directed mutagenesis methods. 

Both assembly methods require that the resulting fragment encoding the 
newly designed ZFP be ligated into a vector. Ultimately, the ZFP-encoding sequence is 
cloned into an expression vector. Expression vectors that are commonly utilized include, 
but are not limited to, a modified pMAL-c2 bacterial expression vector (New England 
BioLabs or an eukaryotic expression vector, pcDNA (Promega). The final constructs are 
verified by sequence analysis. 

Any suitable method of protein purification known to those of skill in the 
art can be used to purify ZFPs of the invention (see, Ausubel, supra, Sambrook, supra). 
In addition, any suitable host can be used for expression, e.g., bacterial cells, insect cells, 
yeast cells, mammalian cells, and the like. 

Expression of a zinc finger protein fused to a maltose binding protein 
(MBP-ZFP) in bacterial strain JM109 allows for straightforward purification through an 
amylose column (NEB). High expression levels of the zinc finger chimeric protein can 
be obtained by induction with IPTG since the MBP-ZFP fusion in the pMal-c2 expression 
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plasmid is under the control of the tac promoter (NEB). Bacteria containing the MBP- 
ZFP fusion plasmids are inoculated into 2xYT medium containing IOjiM ZnC12, 0.02% 
glucose, plus 50 ng/ml ampicillin and shaken at 37°C. At mid-exponential growth IPTG 
is added to 0.3 mM and the cultures are allowed to shake. After 3 hours the bacteria are 
5 harvested by centrifugation, disrupted by sonication or by passage through a french 
pressure cell or through the use of lysozyme, and insoluble material is removed by 
centrifugation. The MBP-ZFP proteins are captured on an amylose-bound resin, washed 
extensively with buffer containing 20 mM Tris-HCl (pH 7.5), 200 mM NaCl, 5 mM DTT 
and 50 jaM ZnC12 , then eluted with maltose in essentially the same buffer (purification is 
10 based on a standard protocol from NEB). Purified proteins are quantitated and stored for 
biochemical analysis. 



characterized via electrophoretic mobility shift assays (EMSA) (Buratowski & Chodosh, 
in Current Protocols in Molecular Biology yip. 12.2.1-12.2.7 (Ausubel ed. ? 1996)). 

15 Affinity is measured by titrating purified protein against a fixed amount of labeled 
double-stranded oligonucleotide target. The target typically comprises the natural 
binding site sequence flanked by the 3 bp found in the natural sequence and additional, 
constant flanking sequences. The natural binding site is typically 9 bp for a three-finger 
protein and 2 x 9 bp + intervening bases for a six finger ZFP. The annealed 

20 oligonucleotide targets possess a 1 base 5' overhang which allows for efficient labeling of 
the target with T4 phage polynucleotide kinase. For the assay the target is added at a 
concentration of 1 nM or lower (the actual concentration is kept at least 10-fold lower 
than the expected dissociation constant), purified ZFPs are added at various 
concentrations, and the reaction is allowed to equilibrate for at least 45 min. In addition 

25 the reaction mixture also contains 10 mM Tris (pH 7.5), 100 mM KC1, 1 mM MgC12, 0.1 
mM ZnC12, 5 mM DTT, 10% glycerol, 0.02% BSA. (NB: in earlier assays poly d(IC) 
was also added at 10-100 ^ig/jil.) 



which has been pre-run for 45 min in Tris/glycine buffer, then bound and unbound 
30 labeled target is resolved by electrophoresis at 150V. (alternatively, 10-20% gradient 
Tris-HCl gels, containing a 4% poly acryl amide stacker, can be used) The dried gels are 
visualized by autoradiography or phosphorimaging and the apparent Kd is determined by 
calculating the protein concentration that gives half-maximal binding. 



The dissociation constants of the purified proteins, e.g., Kd, are typically 



The equilibrated reactions are loaded onto a 10% polyacrylamide gel, 
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The assays can also include determining active fractions in the protein 
preparations. Active fractions are determined by stoichiometric gel shifts where proteins 
are titrated against a high concentration of target DNA. Titrations are done at 100, 50, 
and 25% of target (usually at micromolar levels). 

5 

III. Applications of Designed ZFPs 

ZPFs that Mnd to a particular target gene, and the nucleic acids encoding 
them, can be used for a vanity of applications. These applications include therapeutic 
methods in which a ZFP or a micleic acid encoding it is administered to a subject and 
1<\ used to modulate the expressio^>f a target gene within the subject (see copending 
application Townsend & TownseVi & Crew Attorney Docket 019496-002200, filed 
January 12, 1999). The modulatiomcan be in the form of repression, for example, when 
the target gene resides in a pathological infecting microrganisms, or in an endogenous 
gene of the patient, such as an oncogene or viral receptor, that is contributing to a disease 

15 state. Alternatively, the modulation can\be in the form of activation when activation of 
expression or increased expression of an endogenous cellular gene can ameliorate a 
diseased state. For such applications, ZFP& or more typically, nucleic acids encoding 
them are formulated with a pharmaceuticallAacceptable carrier as a pharmaceutical 
composition. \ 

20 Pharmaceutically acceptable carriers are determined in part by the 

particular composition being administered, as well as by the particular method used to 
administer the composition, (see, e.g., Remington 's Pharmaceutical Sciences, 17 th ed. 
1985)). The ZFPs, alone or in combination with other suitable components, can be made 
into aerosol formulations (i.e., they can be "nebulized") to be administered via inhalation. 

25 Aerosol formulations can be placed into pressurized acceptable propellants, such as 
dichlorodifluoromethane, propane, nitrogen, and the like. Formulations suitable for 
parenteral administration, such as, for example, by intravenous, intramuscular, 
intradermal, and subcutaneous routes, include aqueous and non-aqueous, isotonic sterile 
injection solutions, which can contain antioxidants, buffers, bacteriostats, and solutes that 

30 render the formulation isotonic with the blood of the intended recipient, and aqueous and 
non-aqueous sterile suspensions that can include suspending agents, solubilizers, 
thickening agents, stabilizers, and preservatives. Compositions can be administered, for 
example, by intravenous infusion, orally, topically, intraperitoneally, intravesically or 
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intrathecally. The formulations of compounds can be presented in unit-dose or multi- 
dose sealed containers, such as ampules and vials. Injection solutions and suspensions 
can be prepared from sterile powders, granules, and tablets of the kind previously 
described. 

5 The dose administered to a patient should be sufficient to effect a 

beneficial therapeutic response in the patient over time. The dose is determined by the 
efficacy and K<j of the particular ZFP employed, the target cell, and the condition of the 
patient, as well as the body weight or surface area of the patient to be treated. The size of 
the dose also is determined by the existence, nature, and extent of any adverse side-effects 
10 that accompany the administration of a particular compound or vector in a particular 
patient 

In other applications, ZFPs are used in diagnostic methods for sequence 
specific detection of target nucleic acid in a sample. For example, ZFPs can be used to 
detect variant alleles associated with a disease or phenotype in patient samples. As an 

15 example, ZFPs can be used to detect the presence of particular mRNA species or cDNA 
in a complex mixtures of mRNAs or cDNAs. As a further example, ZFPs can be used to 
quantify copy number of a gene in a sample. For example, detection of loss of one copy 
of a p53 gene in a clinical sample is an indicator of susceptibility to cancer. In a further 
example, ZFPs are used to detect the presence of pathological microorganisms in clinical 

20 samples. This is achieved by using one or more ZFPs specific to genes within the 
microorganism to be detected. A suitable format for performing diagnostic assays 
employs ZFPs linked to a domain that allows immobilization of the ZFP on an ELISA 
plate. The immobilized ZFP is contacted with a sample suspected of containing a target 
nucleic acid under conditions in which binding can occur. Typically, nucleic acids in the 

25 sample are labeled (e.g., in the course of PCR amplification). Alternatively, unlabelled 
probes can be detected using a second labelled probe. After washing, bound-labelled 
nucleic acids are detected. 

ZFPs also can be used for assays to determine the phenotype and function 
of gene expression. Current methodologies for determination of gene function rely 

30 primarily upon either overexpression or removing (knocking out completely) the gene of 
interest from its natural biological setting and observing the effects. The phenotypic 
effects observed indicate the role of the gene in the biological system. 
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One advantage of ZFP-mediated regulation of a gene relative to 
conventional knockout analysis is that expression of the ZFP can be placed under small 
molecule control. By controlling expression levels of the ZFPs, one can in turn control 
the expression levels of a gene regulated by the ZFP to determine what degree of 
5 repression or stimulation of expression is required to achieve a given phenotypic or 

biochemical effect. This approach has particular value for drug development. By putting 
the ZFP under small molecule control, problems of embryonic lethality and 
developmental compensation can be avoided by switching on the ZFP repressor at a later 
stage in mouse development and observing the effects in the adult animal. Transgenic 

10 mice having target genes regulated by a ZFP can be produced by integration of the 
nucleic acid encoding the ZFP at any site in trans to the target gene. Accordingly, 
homologous recombination is not required for integration of the nucleic acid. Further, 
because the ZFP is trans-dominant, only one chromosomal copy is needed and therefore 
functional knock-out animals can be produced without backcrossing. 

1 5 All references cited above are hereby incorporated by reference in their 

entirety for all purposes. 
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CjjAL. (jtCjj C Cjj 1 A 


/ 1 


QRSALAR 


177 


DSGHLTR 2 83 


EKANLTR 


389 


95 


437 


GACGGCGTA 


72 


QRSALAR 


178 


DSGHLTR 2 84 


ERGNLTR 


390 


117 . 5 


438 


GAGGGGGCG 


73 


RSDELTR 


179 


RSDHLTT 2 85 


RSDNLTR 


391 


62 . 5 


440 


GCCGAGGTGC 


74 


RSDSLLR 


180 


RSKNLQR 2 86 


ERGTLAR 


392 


40 


441 


GGTGGAGTCA 


75 


DSGSLTR 


181 


QSGHLQR 2 8 7 


TSGHLTR 


393 


250 


445 


GTCGCAGTGA 


76 


RSDSLRR 


182 


QSSDLQK 288 


DSGSLTR 


394 


1000 




23 



4 5 0 


GACTTGGTGC 


77 


X~» /"*! X"\ m T" TV x*v 

RSDTLAR 


183 


RGDALTS 289 


DRSNLTR 


395 


13 0 


4 b J 


GGTGGAGTCA 7 8 


T~\T*> t~1 Tl T TV "n 

DRSALjAR 


184 


QSGHLQR 2 90 


DSSKLSR 


396 


150 


461 


GAGTACTGTA 


79 


/— \ x") (— i t t t rr-irp 

QRSHLTT 


nor" 

18 5 


DRSNLRT 291 


RSDNLAR 


397 


12 0 


4 6 3 


GTGGAGGAGA 8 0 


RSDNLiTR 


18 6 


RSDNLAR 2 92 


X^ O T~N 7\ T 7\ X^> 

RSDALAR 


3 98 


0 . 5 


A £ A 

4 64 


GTGGAGGAGA 


81 


RSDNLTR 


18 7 


X"* r-i T"\"KTT 7\ X"T; O O O 

RSDNLAR 2 93 


n r~« x^v x tv x"* 

RSDSLAR 


3 99 


0 . 4 


466 


CAGGCTGCGC 


82 


RSDDLTR 


188 


QSSDLQR 2 94 


RSDNLRE 


400 


65 


4 67 


CAGGCTGCGC 


83 


RSDELTR 


189 


QSSDLQR 2 95 


RGDHLKD 


401 


800 


A f~ O 

4 6 8 


CAGGCTGCGC 


84 


RSDDLTR 


190 


QSSDLQR 2 96 


RGDHLKD 


402 


42 


4 6 9 


GAAGAGGTCT 


85 


T"\X"1 O Tl T TV X"» 

DRSALAR 


191 


X^l 1*1 X"\TVTX TA #™\ /""\ i - 1 

RSDNLAR 2 97 


QSGNLTR 


4 03 


13 . 5 


4 /z 


GAGGTCTGGA 8 6 


X"» (— 1 pTTT rnrp 

RSSHLTT 


192 


r\n t~y ta x t\ x~» *~\ r\ t~\ 

DRSALAR 2 98 


RSDNLAR 


404 


80 


4 76 


GGAGAGGATG 


87 


TTSNLRR 


193 


RSDNLAR 2 99 


QSDHLTR 


405 


80 


All 


GGAGAGGATG 


88 


TTSNLRR 


194 


RSDNLAR 3 00 


QRAHLAR 


406 


100 


A 'I O 

4 / o 


GGAGAGGATG 


89 


mmriTiTT x*» x*» 

TTSNLRR 


195 


RSDNLAR 3 01 


QSGHLRR 


407 


60 


yi *7 Q 


GTGGCGGACC 


90 


T— \ fi OTsTT 'I'll 

DSSNLTR 


196 


x^ ox^xnx ad o 

RSDELQR 3 02 


T*\ x^N TV T* TV T^k 

RSDALAR 


408 


8 . 5 


/I Q A 


GTGGCGGACC 


91 


no f-i 7\TT nrn 

DSSNLTR 


197 


I^ADTLRR 3 03 


RSDALAR 


409 


5 


/I Q "3 


GAGGGCGAAG 


92 


QSANLAR 


198 


ESSKLKR 3 04 


RSDNLAR 


410 


13 0 


/I O A 

4 o4 


GAGGGCGAAG 


93 


/™\ f"1 rMvTT TA X"4 

QSDNLAR 


199 


ESSKLKR 3 05 


RSDNLAR 


411 


1000 


4 o b 


GGAGAGGTTT 


94 


QSSALAR 


200 


RSDNLAR 3 06 


QRAHLAR 


412 


110 


/I O ""7 

4 o / 


GGAGAGGTTT 


95 


TiTn "a rxix *r\ x~» 

NRATLAR 


201 


RSDNLAR 3 07 


QSGHLAR 


413 


76 . 9 


4 O O 


TGGTAGGGGG 96 


T") OrMTT 7\ x*» 

RSDHLAR 


2 02 


RSDNLTT 3 08 


RSDHLTT 


414 


35 


^ y u 


TAGGGGGTGG 


97 


RSDSLLR 


203 


RSDHLTR 309 


RSDNLTT 


415 


1 . 5 


n *3 


GCCGAGGTGC 


98 


KbDbLLR 


2 04 


RSDNLAR 310 


T — 1 T"S m T TV T^fc 

ERGTLAR 


416 


50 


C r\ /l 


GCCGAGGTGC 


99 


RSDSLLR 


2 05 


RSDNLAR 311 


DRSDLTR 


417 


25 


c n c 
bUb 


GCCGAGGTGC 


100 


RSDSLLR 


206 


RSDNLAR 312 


DCRDLAR 


418 


65 


r" o /T 

b-^ 6 


GCGGGCGGGC 


101 


X"\ OTMIT m t— \ 

RSDHLTR 


207 


ERGHLTR 313 


RSDTLKK 


419 


8 


b4 .3 


GAGTGTGTGA 102 


RSDLLQR 


208 


MSHHLKE 314 


RSDHLSR 


420 


50 


c /i yi 
b44 


GAGTGTGTGA 103 


RSDSLLR 


209 


MSHHLKE 315 


RSDNLAR 


421 


125 


b4 b 


GAGTGTGTGA 


104 


RKDSLVR 


210 


TSDHLAS 316 


RSDNLTR 


422 


32 


546 


GAGTGTGTGA 


105 


RSDLLQR 


211 


MSHHLKT 317 


RLDGLRT 


423 


500 


547 


GAGTGTGTGA 


106 


RKDSLVR 


212 


TSGHLTS 318 


RSDNLTR 


424 


500 


548 


GAGTGTGTGA 107 


RSSLLQR 


213 


MSHHLKT 319 


RSDHLSR 


425 


500 


549 


GAGTGTGTGA 


108 


RSSLLQR 


214 


MSHHLKE 32 0 


RSDHLSR 


426 


500 


550 


GAGTGTGTGA 


109 


RKDSLVR 


215 


TKDHLAS 321 


RSDNLTR 


427 


20 



24 



bbl 


G AG 1 G 1 G 1 G A 


110 


RSDLLQR 


216 


TV/TO TTTTT T^T* *"1 O O 

MSHHLKT 32 2 


RSDHLSR 


42 8 


50 


552 


GAGTGTGTGA 


111 


T~» T >7~ T — \ T T T'n 

RKDSLVR 


217 


MSHHLKT 323 


RSDNLTR 


429 


31 


rr r~ -"5 

553 


GAGTGTGTGA 


112 


RSDSLLR 


218 


lV/f HTTTTT TV1 — i *~s A 

MSHHLKE 324 


RSDNLTR 


430 


125 


554 


GAGTGTGTGA 


113 


RKDSLVR 


219 


TSDHLAS 325 


RSDNLAR 


431 


62 . 5 


558 


TGCGGGGCA 


114 


QSGDLTR 


2 2 0 


RSDHLTR 32 6 


DSGHLAS 


432 


21 


r r n 


GAG 1 G I G I G A 


TIC" 

115 


RSDSLLR 


221 


TSDHLAS 32 7 


RSDNLAR 


433 


1000 


r~ iC r\ 

bo U 


GAG 1 G I G I GA 


116 


T~> O £*• T T /""NT") 

RSSLLQR 


'"i i o 
2 2 2 


MSHHLKT 328 


RSDHLSR 


434 


500 


5 61 


GAGTGTGTGA 


117 


RKDSLVR 


223 


MSHHLKE 32 9 


RSDNLAR 


435 


1000 


562 


GAGTGTGTGA 


118 


RSDSLLR 


224 


TSGHLTS 330 


RSDNLAR 


436 


1000 


565 


GATGCTGAG 


119 


RSDNLTR 


225 


TSSELQR 331 


QQSNLAR 


437 


100 


567 


GAAGATGAC 


120 


EKANLTR 


226 


TSANLSR 332 


QRSNLVR 


438 


47 . 5 


568 


GATGACGAC 


121 


EKANLTR 


227 


DSSNLTR 333 


TSANLSR 


439 


300 


569 


GTAGTTGTG 


122 


RSDSLLR 


228 


TGGSLAR 3 34 


QRSALTR 


440 


52 




25 

TABLE 2 



SBS# 




SEQ SE£ 




CTO 


lVQ 




ID 


Fl ID 


r Z ±u 


do t "Pi 
r j 1JJ 


[TIM} 


2 01 




441 


RSDSLTS 64 6 


T7PQTT. r PD QC1 
CjKc IJjIK o Jl 


vJKAUxjKK lUbo 


1UUU 


9 09 


\JV_riO^\_ J. X vJT 


442 


RSDSLTS 64 7 




yKAJJLAR 1057 


1 A A A 

10 0 0 


9 01 

U J 


n p a ac p t t n 


443 


RSDSLTS 648 


JcjK.o1.Li IK ojj 


/^\D TVPT DTI I n r n 


"i f\ f\ r\ 

10 0 0 


2 04 

£t \J ^ 




444 


RSDSLTS 64 9 


T7" "Q QTT TD Q C A 

HjKo 1 J_i 1 K o j4 


QKA-ILAR 1059 


10 0 0 


V J 




445 


QSANLAR 65 0 


OOATT AD O UZ CZ 

S^o/\ 1 lxA.K bob 


KbDNLSR 1060 


80 


2 0 6 

^ W 


PAPPTAPA A 


446 


QSANLAR 651 


^ o/\ V 1_lMJK bob 


KbJJJNLSR 10 61 


10 0 0 


2 0 7 


PAPTPPT'TA 


447 


QRASLAS 652 


DCHUT TT O CT 'I 

KoJJhLLll ob / 


RSDNLAR 10 62 


70 


2 0 8 


TAPPTPTTA 


448 


QRASLAS 653 


"PlO OAT A "D QCO 

JJKbALAK obo 


RSDNLAS 10 63 


1000 


2 0 9 


PPAPTPPTT 


449 


QSSALAR 654 


DCr\AT 7\D o c n 

KoJJA.J_iAK oby 


/~\ T~} 71 T TT "TV T"1 1 C\ /~ A 

QRAHLAR 10 64 


35 


210 


PPAPTPPTT 


450 


NRDTLAR 655 


DCFlAT ad o a C\ 
KoJJAxjAK obU 


QRAHLAR 1065 


65 


211 


PPAPTPPTT 

VJVJ.tt.Vj X VJVJ X X 


451 


QSSALAR 656 


DCHA T AC O/Ti 

KolJALAb obi. 


QRAHLAR 106 6 


14 0 


212 


GGAPTPPTT 


452 


NRDTLAR 657 


pcnST AC QCO 

rCoiJ/ixxfio obz 


/■~\t~> a tut 7\ n n r\ *~7 


4 0 0 


213 


GTTPPTPPA 


453 


QRAHLAR 658 


HCCTT AD Q /T O 


QbbALAR 10 6 8 


1000 


214 


PTTPPTPPA 

VJ X X V_J v_ X VJVJ.tt. 


454 


QRAHLAR 659 


/~\ O O TT AD O A 

ybblJjAK ob4 


■KTTTTMT1T "A t~\ -i /~i ^ r"\ 

NRDTLAR 10 6 9 


1000 


215 


PAAPTPTPT 

\JX^C^\J X \ X VJ X 


455 


NRDHLMV 66 0 


"PiD OAT AD O^C 

UKbAijAK ODD 


QSANLSR 107 0 


1000 


216 


GAAGTCTGT 

v_j x v . x vj x 


456 


NRDHLTT 661 


T^D OAT AD O /T 

UKoAJjAK bob 


QSANLSR 1071 


1000 


217 


GAGGTPPTA 


457 


QRSALAR 662 


T~>D CAT AD Q n 

JJKb/\J_xA.K ob / 


RSDNLAR 10 72 


40 


219 


GATGTTPAT 

vJ^i. X VJ X X VJJ^i. x 


458 


QQSNLAR 6 63 


"NT"D P|TT AD Q £" Q 

IN KJJ 1 1j/\K obo 


NRDNLSR 10 73 


1000 


220 


GATGTTPAT 

X V_7 X X VJT^i. X 


459 


QQSNLAR 664 


MDHTT AD O^TQ 


QQSNLSR 10 74 


1000 


221 


PA TP APT A P 

vj.z-i x vj.tt.vj x rl^ 


A cz n 
^tbU 


"Pi "DC "NTT DT /T £T CT 

UKbiMxjKl bob 


D C ntvTT AD O ^ A 

KoUINxjAK o / U 


NRDNLAR 10 75 


1000 


222 


GATGAGTAC 


461 


ERSNLRT 666 


RSDNLAR 871 


NRDNLAR 1076 


1000 


223 


GATGAGTAC 


462 


DRSNLRT 667 


RSDNLAR 8 72 


QQSNLAR 10 77 


105 


224 


GATGAGTAC 


463 


ERSNLRT 668 


RSDNLAR 8 73 


QQSNLAR 10 78 


1000 


225 


TGGGAGGTC 


464 


DRSALAR 66 9 


RSDNLAR 8 74 


RSDHLTT 10 79 


6 


226 


GCAGCCTTG 


465 


RGDALTS 670 


ERGTLAR 8 75 


QSGSLTR 1080 


1000 


227 


GCAGCCTTG 


466 


RGDALTV 6 71 


ERGTLAR 8 7 6 


QSGSLTR 1081 


1000 


228 


GCAGCCTTG 


467 


RGDALTM 6 72 


ERGTLAR 877 


QSGSLTR 1082 


1000 


229 


GCAGCCTTG 


468 


RGDALTS 6 73 


ERGTLAR 8 7 8 


RSDELTR 10 83 


1000 



ft ft ft 

23 0 


GCAGCCTTG 


469 


RGDALTV 6 74 


231 


GCAGCCTTG 


470 


RGDALTM 675 


232 


GGTGTGGTG 


471 


RSDALTR 676 


2 3 3 


z — i /—i m/"i n-i/~"i /^i ms~i 

GGTGTGGTG 


472 


RSDALTR 677 


2 3 5 


GTAGAGGTG 


473 


RSDALTR 67 8 


2 3 6 


GGGGAGGGG 


4 74 


RSDHLAR 679 


O O *7 

2 3/ 


GGGGAGGC c 


4 75 


ERGTLAR 6 8 0 


2 3 o 


ggggaggc c 


4 76 


ERGTLAR 681 


O "5 o 

2 3 9 


GGCGGGGAG 


4 77 


RSDNLTR 682 


2 4 0 


GCAGGGGAG 


4 78 


RSDNLTR 6 83 


2 42 


GGGGGTGCT 


479 


QSSDLRR 684 


24 3 


GTGGGCGCT 


480 


QSSDLRR 68 5 


244 


m TV TV /™1 TV TV /""I /"I 

TAAGAAGGG 


4 81 


RSDHLAR 68 6 


2 4 5 


rri TV TV /~1 TV TV /"""* / — 1 /—T 

TAAGAAGGG 


4 82 


RSDHLAR 687 


24 6 


TV TV f~~\ /—I /^| TV /—i 

GAAGGGGAG 


483 


RSDNLAR 688 


24 7 


/~1 TV TV /~1 /~1 / — 1 i TV /~1 

GAAGGGGAG 


484 


RSDNLAR 68 9 


O "7 ^ 

z / o 


GCGGCCGCG 


4 85 


RSDELTR 690 


O "7 F 7 

Z / / 


CjCGGCCGCG 


4 8 6 


RSDELTR 6 91 


O 7 Q 

Z / o 


/-« /™f Oi /-i /'-i /^i >ft) /-I /-( 


4 8 7 


QSWELTR 6 92 


O *"7 Q 

Z / y 


r~*(~*r~* r~~\ r~\ 
bLbbLLbLG 


4 8 8 


QSWELTR 6 93 


o q n 
Z o U 


CjCGGCCGCG 


4 8 9 


QSGSLTR 694 


2 o 1 


GCGGCCGCG 


4 90 


QSGSLTR 695 


O Q O 

z 82 


GCAGAAGTG 


4 91 


RGDALTR 696 


o o o 


GCAGAAGTG 


4 92 


RSDALTR 697 


2 o4 


GCGGCCGCG 


4 93 


QSGSLTR 698 


O Q C 

2 o b 


1 G1GCGGCC 


4 94 


ERGTLAR 69 9 


*5 Q *7 
2 O / 


/'""I /-* TV /""I TV TV /~1 /~1 /~1 

G C AGAAG C G 


4 95 


RGPDLAR 700 


o o o 

2 88 


/~1 T\ /~1 TV TV /" — 1 /—i / — 1 

GCAGAAGCG 


4 96 


RGPDLAR 701 


289 


GCAGAAGCG 


497 


RGPDLAR 702 


290 


GCAGAAGCG 


498 


RSDELAR 7 03 


292 


GCAGAAGCG 


499 


RSDELTR 7 04 


293 


GTGTGCGGC 


500 


DRSHLTR 705 


296 


TGCGCGGCC 


501 


ERGTLAR 706 



T~l T~» /"I (TIT TV T~l ft n O 

ERGTLAR 879 


T — \ f-~i T~l X m -t ft ft >i 

RSDELTR 10 84 


1000 


ERGTLAR 88 0 


RSDELTR 108 5 


1000 


RSDALAR 881 


NRSHLAR 1086 


50 


RSDALAR 882 


TV HTTT "TV T~\ "1 ft ft ' — 1 

QASHLAR 108 7 


100 


RSDNLAR 883 


>^**v T^V r***! TV T" TV T~\ ^ /-v 

QRGALAR 1088 


80 


KdUJMLAK oo4 


n o n>T T t on i r\ o o 

RSDHLSR 108 9 


0 . 3 


RSDNLAR 88 5 


T~> O T~\ T T X O n T /"\ O ft 

RSDHLSR 10 90 


0 . 3 


T~» f""* T"VTVTT /™\T~V ft ft 

RSDNLQR 886 


RSDHLSR 1091 


0 . 8 


t*i nrvTTT m ~r~\ ft ft ' — 7 

RSDHLTR 887 


T**\ T^V /^l TTT TV T^ 1 /"\ /^v 

DRSHLAR 10 92 


0 . 4 


RSDHLSR 888 


/ — t T mx^ "1 ft ft ft 

QSGSLTR 10 93 


1 


/-I / — 1 T T X TV "n ft ft ft 

QSSHLAR 88 9 


RSDHLSR 10 94 


1 


T>, T~> PTTT TV n ft ft f\ 

DRbHLAR 890 


RSDALAR 10 95 


75 


Af/^TVTT l~n T~1 ft ft i 

QSGNLTR 891 


QSGNLRT 10 96 


100 


/•~\ /~i tv TTT mi~i ft ft ft 

QSANLTR 8 92 


QSGNLRT 10 97 


235 


RSDHLAR 8 93 


QSGNLTR 10 98 


2 


RSDHLAR 8 94 


QSGNLRR 10 99 


2 


ERGTLAR 8 95 


RSDERKR 1100 


90 


xvn e~t t rnT~i ft ft 

DRSSLTR 8 96 


RSDERKR 1101 


107 


hiKGlLAR 8 97 


RSDERKR 1102 


190 


T~\T~) COT rpn ft ft ft 

UKbbLIR 89 8 


RSDERKR 1103 


260 


■nT*»/~imx tv t^i ft ft ft 

ERGTLAR 8 99 


RSDERKR 1104 


160 


T~M~i fi f i x rxiT™v ft ft ft 

DRSSLTR 90 0 


RSDERKR 1105 


225 


rvn tv tvtt rpn fv ft -i 

QSANLTR 901 


X™\ /^i TV T^\ T T\ ^ *1 /-\ 

QSADLAR 1106 


1000 


/~s /~i tvTT rxi T~) /^v ft 

QSGNLTR 902 


QSGSLTR 1107 


2 


T!) CTMTT mm ft ft ft 

RSDHLTT 903 


RSDERKR 1108 


1000 


KbDELTR 904 


n T^TMTT "1 -1 ft ft 

SRDHLQS 1109 


1000 


O 7V TVTT T>n ft ft r" 

QSANLTR 9 05 


QSGSLTR 1110 


1000 


/— \ /— 1 TV VTT mT^V ft ft y" 

QSANLTR 9 06 


QSGSLTR 1111 


1000 


QSGNLQR 90 7 


QSGSLTR 1112 


800 


QSANLQR 908 


QSADLAR 1113 


1000 


QSANLQR 909 


QSGSLTR 1114 


1000 


ERHSLQT 910 


RSDALTR 1115 


320 


RSDELTR 911 


DRDHLQS 1116 


1000 




27 



297 


TGCGCGGCC 


502 


ERGTLAR 70 7 


RSDELRR 912 


DRSHLQT 1117 


r~ r\ r\ 

50 0 


298 


GCTTAGGCA 


503 


QTGELRR 708 


RSDNLQK 913 


TSGDLSR 1118 


4 0 0 0 


299 


GCTTAGGCA 


504 


QTSDLRR 709 


RSDNLQK 914 


QSSDLQR 1119 


a t~\ r\ r\ 

4 0 0 0 


300 


GCTTAGGCA 


505 


QTADLRR 710 


RSDNLQR 915 


QSSDLSR 1120 


4 0 0 


301 


GCTTAGGCA 


506 


QSADLRR 711 


RSDNLQT 916 


QSSDLSR 1121 


3 5 0 


302 


GCTTAGGCA 


507 


QSGSLTR 712 


RSDNLQT 917 


QSSDLSR 1122 


75 


303 


GCTTAGGCA 


508 


QTGSLTR 713 


RSDNLQT 918 


QSSDLSR 1123 


135 


304 


y— * y— * I-**! i 1 t T4 y— * y^* y^»* -j* 

GCTTAGGCA 


509 


QTADLTR 714 


RSDNLQT 919 


y— v y— * y— i t™N T" /™1 T"^ -1 «T /~\ y] 

QSSDLSR 1124 


23 0 


305 


GCTTAGGCA 


510 


y— v fifc y— * t— ^ T" mT*\ *T -1 r— 

QTGDLTR 715 


RSDNLQT 92 0 


QSSDLSR 1125 


230 


306 


r*n i n "rv y"i s**i *jv 

GCTTAGGCA 


511 


QTASLTR 716 


RSDNLQT 921 


QSSDLSR 1126 


280 


307 


y— « TV TV TV TV y**J y— *i 

GAAGAAGCG 


512 


RSDELRR 717 


QSGNLQR 92 2 


QSGNLSR 112 7 


50.5 


308 


GAAGAAGCG 


513 


RSDELRR 718 


QSANLQR 92 3 


QSANLQR 112 8 


1000 


309 


GGAGATGCC 


514 


ERSDLRR 719 


QSSNLQR 92 4 


QSGHLSR 112 9 


4000 


310 


GGAGATGCC 


515 


DRSDLTR 72 0 


NRDNLQT 92 5 


QSGHLSR 113 0 


1000 


311 


GGAGATGCC 


516 


DRSTLTR 721 


NRDNLQR 92 6 


QSGHLSR 1131 


170 


312 


GGAGATGCC 


517 


T~l T"^V m T" TV T"^V ^T ^\ 

ERGTLAR 722 


NRDNLQR 92 7 


QSGHLSR 1132 


2000 


313 


tv y«* tv m y"i v*< 

GGAGATGCC 


518 


DRSDLTR 723 


QRSNLQR 92 8 


QSGHLSR 113 3 


1000 


314 


y-^i T\ y™1 TV fit y<i y™* 

GGAGATGCC 


519 


DRSSLTR 724 


QSSNLQR 92 9 


QSGHLSR 1134 


117.5 


315 


GGAGATGCC 


52 0 


ERGTLAR 72 5 


QSSNLQR 93 0 


QSGHLSR 113 5 


265 


316 


GGAGATGCC 


521 


ERGTLAR 72 6 


QRDNLQR 931 


QSGHLSR 113 6 


3000 


318 


fTl *7\ -jv ✓"l TV | 11 y— «i yT 

TAGGAGATGC 


522 


RSDALTS 72 7 


RSDNLAR 93 2 


RSDNLAS 113 7 


100 


319 


GGGGAAGGG 


523 


KTSHLRA 72 8 


QSGNLSR 93 3 


RSDHLSR 113 8 


125 


320 


y^* y— ^ y— * -jv Tv y—1 y™i 

GGGGAAGGG 


524 


RSDHLTR 72 9 


QSGNLSR 934 


RSDHLSR 113 9 


5 


321 


GGCGGAGAT 


525 


TTSNLRR 73 0 


QSGHLQR 93 5 


DRSHLTR 114 0 


200 


323 


GGCGGAGAT 


526 


TTSNLRR 731 


QSGHLQR 93 6 


DRDHLTR 1141 


600 


324 


GGCGGAGAT 


527 


TTSNLRR 73 2 


QSGHLQR 93 7 


DRDHLTR 114 2 


2 00 


325 


GTATCTGCT 


528 


NSSDLTR 73 3 


NSDVLTS 93 8 


QSDVLTR 114 3 


1000 


326 


GTATCTGTT 


529 


NSDALTR 73 4 


NSDVLTS 93 9 


QSDVLTR 114 4 


1000 


327 


TCTGCTGGG 


530 


RSDHLTR 73 5 


NSADLTR 94 0 


NSDDLTR 114 5 


1000 


328 


TCTGTTGGG 


531 


RSDHLTR 73 6 


NS SALTS 941 


NSDDLTR 114 6 


1000 


349 


GGTGTCGCC 


532 


DCRDLAR 73 7 


DSGSLTR 942 


TSGHLTR 114 7 


1000 


350 


TCCGAGGGT 


533 


TSGHLTR 73 8 


RSDNLTR 94 3 


DCRDLTT 114 8 


332 


351 


GCTGGTGTC 


534 


DSGSLTR 739 


TSGHLTR 944 


TLHTLTR 114 9 


1000 




28 







bJb 


T~) O TD O T T TD *~1 A r\ 

KbDbLiijR 74 0 


n CTDT-TT rpn rD /I r~ 

KbUhLIK 94 5 


QSDHLTR 1150 


2 6 


*3 C "3 


/— irprpr-i n7\ nnp 


53 6 


DCRDLAR 741 


QbDHLIR 946 


TSGALTR 1151 


10 0 0 


J b4 


CjAACjjACjCjtAL. 


r 0 T 

5 3 7 


TD 0 0 xtt n™> n <— 7 yl 0 

DSSNLTR 742 


TD O TNATT rpn /i n 

RSDNLTR 94 7 


QRSNLVR 1152 


28 


"2 C r- 
JJ J 


(jAALjACj(jjAC 


coo 
b Jo 


T71 TV" 7\ "NTT rpn T /I D 


TD O TDTvTT rpn C\ A O 

KbDJNJLIK 948 


An f~i "KTT T 7*n "1 T r~ "~> 

QRSNLVR 1153 


*D r\ 

2 0 


Job 




b jy 


DCHUT DD 1 A A 

KbJJiiljKK /44 


D CHUT rp-r/- q /l <D 


TDt^TDTTT (DTD T T r~ >1 

DSDHLSR 1154 


10 0 0 


J J / 


GGCTGGGCG 


540 


KoJJJiijKK /4 b 


D CHUT T T7 qt a 

KbJJrli_ilis. ybU 


TDOTDTJTT O TD "1 "1 I~ rr 


T /D /D /D 

10 0 0 


O CO 

j o o 


GGCTGGGCG 


541 


KoJJiiljKK /4b 


dchut fpv q cr ~i 
KbUrlJ_i 1 iv yoi 


Ubbri-LibK llbo 


O n r 

zzb 


J D JL 


GGGTTTGGG 


542 


KbJJnJjiK /4 / 


/"D O O A T rpTD n c o 

ybbALilK 9b2 


TD (DTDTTT rpn -| t rr 

RbDHLTR 1157 


13 0 


*2 d 


GGGTTTGGG 


543 


RSDHLiTR 74 8 


QSSVLTR 953 


T"» OnTTT mT^ -1 -| I — /-^i 

RSDHLTR 1158 


200 


J 6 4 


GTGTCCGAAG 544 


RSDNLTR 74 9 


DSAVLTT 954 


RSDSLTR 1159 


1000 


*5 £T C 

job 


GGTGCTGGT 


545 


QASHLIR 750 


/~\ TV O T TT rpf) /— \ i — 1 — 

QASVLTR 955 


QASHLTR 1160 


600 


ODD 


GAGGGTGCT 


546 


QAbVLlR /bl 


QASHLTR 95 6 


RSDNLTR 1161 


1000 


*3 £C *7 


GGGGGCGGG 


547 


n 0 t— \ T_T T rpn "~7 IT *D 

KbUHLlK /bz 


DbGHLTR 95 7 


RSDHLQR 1162 


60 


*5 ^ Q 
JOO 


GAGGGGGCG 


548 


TD nr\DT rpn "-7 r- -5 


TD nr\TTT rpn f"\ 1 — (~\ 

RSDHLTR 95 8 


n nr\ikTT mi~ \ i i s~ *~\ 

RSDNLTR 1163 


3 . 5 


*3 ^ Q 

joy 


GTAGTTGTG 


549 


KbDALlR /b4 


TGGSLAR 95 9 


QSGSLTR 1164 


95 


*5 ^ n 


GTAGTTGTG 


550 


D CHA T TO i r r 

KbJJALlK /bb 


NRATLAR 960 


QSASLTR 1165 


3 00 


O / J_ 


GTAGTTGTG 


551 


DCHAT nrn "D rr 

Kb DAL IK /bb 


TVTTD TV T^T 7\ T~i O /** "1 

NRATLAR 961 


QSGSLTR 1166 


175 




GTAGTTGTG 


552 


TO O TD O T T TD T r n 

KbDbLLR /b / 


TGGSLAR 962 


QSASLTR 1167 


112.5 


O / J 


\J L £\\J 1 lulu 


c; cr 0 
j jj 


D OnCT T D "ICQ 

KbJJbJ-iijK /bo 


"KTTD 7\ rpT Tt TD (D *"> 

NRATLAR 963 


QSASLTR 1168 


32 0 


7 "7 4 


GCTGAGGAA 


554 


^KbJNJijVK /by 


TD C" 1 TD"KTT rpTD C\ /~ A 

RSDNLTR 964 


TSSELQR 1169 


3 . 3 


'37c 


GAGGAAGAT 


555 


f\ /~\ O "NTT 7\ TD «-7 /- 

QQbNLAR / 6 0 


APrUTT /DPI (D r~ 

QSGNLQR 9 65 


RSDNLTR 1170 


85 


077 

Oil 


GTGTTGGCAG 


556 


/D COOT mD *D 1 

ybLrbLilK /bl 


TD TD 7\ T rp r~i f\ /-- 

KGDALTS 96 6 


■n t~\ "t\ t mT~» i *i r ~r -i 

RSDALTR 1171 


89 


T7Q 


GCCGAGGAGA 


557 


TD CRXTT f~n TD *~7 0 

RbDNLTR 762 


t~i nn*MT m t^i /~n r~ * — 7 

RSDNLTR 9 67 


DRSSLTR 1172 


31 


n "~7 q 

j / y 


GCCGAGGAGA 


558 


RSDNLTR 763 


RSDNLTR 96 8 


ERGTLAR 1173 


3 


*3 D O 
J O U 


GAGTCGGAAG 


559 


QbAIMLAR 7 64 


RSDELTT 969 


RSDNLAR 1174 


1000 


O O X 


GCAGCTGCGC 


560 


KbJJhjijlK 76b 


/D (D O TD T T~l r-\ »— j /-\ 

QSSDLQR 970 


QSGDLTR 1175 


1 . 5 


TOT 

Jo J 


TGGTTGGTAT 


561 


QbAlLAR 766 


RGDALTS 971 


RSDHLTT 1176 


1000 


oo4 


GTGGGCTTCA 


562 


DRbALiTT 767 


DRSHLAR 972 


RSDALAR 1177 


60 


385 


GGGGCGGAGC 


563 


RSDNLTR 768 


RSDTLKK 973 


RSDHLSR 1178 


1 . 2 


386 


GGGGCGGAGC 


564 


RSDNLTR 769 


RSDELQR 974 


RSDHLSR 1179 


0 . 4 


387 


GGCGAGGCAA 


565 


QSGSLTR 770 


RSDNLAR 975 


DRSHLAR 1180 


2 . 5 


388 


GGCGAGGCAA 


566 


QSGDLTR 771 


RSDNLAR 976 


DRSHLAR 1181 


28 


390 


GTGGCAGCGG 


567 


RSDTLKK 772 


QSSDLQK 977 


RSDALAR 1182 


20 




29 



r\ '"l 

3 92 


GTGGCAGCGG 568 


RSDELTR 773 


QSSDLQK 978 


RSDALAR 1183 


1000 


3 96 


GCGGGAGCAG 56 9 


QSGSLTR 774 


QSGHLQR 97 9 


RSDTLKK 1184 


18 . 8 


3 97 


GCGGGAGCAG 57 0 


QSGDLTR 775 


QSGHLQR 98 0 


RSDTLKK 1185 


25 


4 00 


TCAGTGGTGG 571 


T"^ T*\ TV T "TV T™V r — 1 r — y j*-— 

RSDALAR 7 76 


RSDSLAR 981 


QSGDLRT 1186 


40 


4 05 


GCGGCCGCA 


572 


RSDELTR 77 7 


ERGTLAR 982 


RSDERKR 1187 


110 


406 


GCGGCCGCA 


573 


RSDELTR 7 78 


DRSSLTR 983 


RSDERKR 1188 


110 


4 07 


GCGGCCGCA 


574 


QSWELTR 77 9 


T — 1 T^V ✓"i | 1 IT TV T"" \ j^N A 

ERGTLAR 9 84 


RSDERKR 1189 


410 


408 


GCGGCCGCA 


575 


QSWELTR 78 0 


DRSSLTR 985 


RSDERKR 1190 


380 


409 


GCGGCCGCA 


576 


QSGSLTR 781 


ERGTLAR 98 6 


RSDERKR 1191 


50 


4 1 U 


GCAGAAGTC 


577 


RSDALTR 782 


QSGNLTR 98 7 


QSGSLTR 1192 


3 


411 


GCGGCCGCA 


578 


QSGSLTR 7 83 


RSDHLTT 988 


RSDERKR 1193 


1000 


A 1 O 

412 


GCGTGGGCG 


579 


QSGSLTR 784 


X^V r**lT— \TTT mm /™\ 

RSDHLTT 98 9 


RSDERKR 1194 


5 


413 


GCGTGGGCA 


580 


QSGSLTR 785 


RSDHLTT 990 


RSDERKR 1195 


5 


414 


GCAGAAGCA 


581 


RSDELTR 78 6 


QSANLQR 9 91 


QSGSLTR 1196 


1000 


41b 


GTGTGCGGA 


582 


X"\X1 DTTT mi~N » — J i— » 

DRSHLTR 787 


ERHSLQT 9 92 


RSDALTR 1197 


1000 


416 


TGTGCGGCC 


583 


t — ixi rxix *>\ x*v ■ — t e"\ 

ERGTLAR 78 8 


RSDELRR 9 93 


DRSHLQT 1198 


1000 


4 y j 


GGGGTGGCGG 


584 


RSDTLKK 789 


RSDSLAR 994 


RSDHLSR 1199 


300 


4 y 4 


GCCGAGGAGA 


585 


RSDNLTR 7 90 


RSDNLTR 9 95 


DRSSLTR 12 0 0 


90 


4 y o 


GGTGGTGGC 


586 


T~\ m t t x x^i x^i 1 — t r~\ i 

DTSHLRR 791 


TSGHLQR 9 96 


TSGHLSR 12 01 


1000 


4 y / 


GTTTGCGTC 


587 


X"i rxi "7\ x x~» xi 1 — i /-\ <— » 

ETASLRR 7 92 


DSAHLQR 9 97 


TSSALSR 1202 


1000 


4 y o 


GAAGAGGCA 


588 


/-\m/""i FIT X^ X"V *"» f"\ *"N 

QTGELRR 7 93 


RSDNLQR 9 98 


QSGNLSR 12 0 3 


30 


/ion 

4 y y 


GCTTGTGAG 


589 


RTSNLRR 7 94 


TSSHLQK 999 


DTDHLRR 12 04 


1000 


r n a 


GCTTGTGAG 


590 


RSDNLTR 7 95 


QSSNLQT 100 0 


DRSHLAR 12 0 5 


1000 


5 01 


GTGGGGGTT 


591 


XTT^ TV m T TV T^* ^T ^V 

NRATLAR 7 96 


RSDHLSR 10 01 


RSDALAR 12 0 6 


8 


rr rv o 

502 


GGGGTGGGA 


592 


y— \ /I TV T TT" TV T""V , /-\ 

QSAHLAR 7 97 


RSDALAR 10 02 


RSDHLSR 12 0 7 


60 


c n *7 
bU / 


GAGGTAGAGG 


593 


t— \ (— t x~\"ktt 7v xi n n n 

RSDNLAR 7 98 


QRSALAR 10 03 


RSDNLAR 12 0 8 


10 


r a o 

b 0 o 


GAGGTAGAGG 


594 


RSDNLAR 7 99 


QSATLAR 10 04 


RSDNLAR 12 0 9 


10 


r~ r\ c\ 

b o y 


GTCGTGTGGC 


595 


RSDHLTT 8 00 


RSDALAR 1005 


DRS ALAR 1210 


100 


510 


GTTGAGGAAG 


596 


QSGNLAR 801 


RSDNLAR 10 06 


NRATLAR 1211 


100 


511 


GTTGAGGAAG 


597 


QSGNLAR 8 02 


RSDNLAR 1007 


QSSALAR 1212 


100 


512 


GAGGTGGAAG 


598 


QSGNLAR 8 03 


RSDALAR 10 08 


RSDNLAR 1213 


10 


513 


GAGGTGGAAG 


599 


QSANLAR 8 04 


RSDALAR 1009 


RSDNLAR 1214 


1.5 


514 


TAGGTGGTGG 


600 


RSDALTR 8 05 


RSDALAR 1010 


RSDNLTT 1215 


10 




30 



515 


TGGGAGGAGT 601 


RSDNLTR 806 


RSDNLTR 1011 


RSDHLTT 1216 


0 . 5 


516 


GGAGGAGCT 


602 


TTSELRR 8 07 


QSGHLQR 1012 


QSGHLSR 1217 


700 


517 


GGAGCTGGGG 603 


RTDHLRR 808 


m rti rti t - i t /*rt t~\ -i rt i rt 

TSSELQR 1013 


QSGHLSR 1218 


50 


518 


GGGGGAGGAG 6 04 


QTGHLRR 809 


AOnTTT AD -| rt -1 yi 

QSGHLQR 1014 


ri nT*\TTT *—t T~» *1 rt *1 rt 

RSDHLSR 1219 


3 0 


519 


GGGGAGGAGA 6 05 


T"V ("I TMlTT TV T~l rt -1 rt 

RSDNLAR 810 


RSDNLSR 1015 


RSDHLSR 12 2 0 


0 . 3 


52 0 


GGAGGAGAT 


606 


TTANLRR 811 


QSGHLQR 1016 


QSGHLSR 12 21 


300 


52 1 


GCAGCAGGA 


607 


QTGHLRR 812 


QSGELQR 1017 


QSGELSR 1222 


1000 


522 


GATGAGGCA 


608 


QTGELRR 813 


RSDNLQR 1018 


TSANLSR 12 2 3 


200 


527 


GGGGAGGATC 


609 


TTSNLRR 814 


RSSNLQR 1019 


RSDHLSR 1224 


2 


528 


GGGGAGGATC 


610 


TTSMLRR 815 


RSSNLQR 102 0 


RSDHLSR 12 2 5 


10 


52 9 


GAGGCTTGGG 


611 


RTDHLRK 816 


TSAELQR 1021 


RSSNLSR 1226 


1000 


r—->-i 
bil 


GCGGAGGCTT 


612 


m m /—i t~ i T" t~i t~-\ rt -i * — i 

TTGELRR 817 


RSSNLQR 102 2 


RSDELSR 1227 


160 


r -5 O 

53 2 


GCGGAGGCTT 


613 


/rt /—| /— 1 TNT /NTS /-i 1 rt 

QSSDLQR 818 


RSSNLQR 102 3 


RSDELSR 12 2 8 


100 


53 3 


GCGGAGGCTT 


614 


QSSDLQR 819 


RSDNLAR 102 4 


RSADLSR 122 9 


7 


534 


GCGGAGGCTT 


615 


QSSDLQR 82 0 


RSDNLAR 102 5 


RSDDLRR 12 3 0 


10 


53 5 


GCAGCCGGG 


616 


RTDHLRR 821 


ESSDLQR 102 6 


QSGELSR 1231 


1000 


C o 

b -5 o 


GCAGAGGCTT 


617 


/■""V CI f"l T~\ t* ✓"N t—\ rt rt rt 

QSSDLQR 822 


RSDNLAR 102 7 


QSGSLTR 1232 


70 


b4 U 


TGGGCAGGCC 


618 


"T—v T~> rt T TX rn T — \ rt rt rt 

DRSHLTR 82 3 


QSGSLTR 1028 


RSDHLTT 12 3 3 


55 




GGGGAGGAT 


619 


TTSNLRR 824 


RSSNLQR 102 9 


RSDHLSR 1234 


3 


b / U 


GGGGAAGGCT 


620 


DSGHLTR 82 5 


QRSNLVR 103 0 


RSDHLTR 12 3 5 


20 


b /l 


GTGTGTGTGT 


621 


T"^\ I T-\ /~1 T 111 rt rt; 

RSDSLTR 82 6 


QRSNLVR 1031 


RSDSLLR 12 3 6 


1000 


5 72 


GCATACGTGG 


622 


RSDSLLR 827 


DKGNLQS 1032 


QSDDLTR 12 3 7 


1000 


b / J 


GCATACGTG 


623 


t*» *"t t~\ /~i t t t^i rt rt rt 

RSDSLLR 82 8 


DKGNLQS 1033 


QSGDLTR 12 3 8 


1000 


b /4 




6 2 4 


T"^\ riT\TTT 111 T^i rt *rt. /— v 

RSDHLTR 82 9 


RSDHLTR 103 4 


DKGNLQT 12 3 9 


25 


rnr 

b / b 


TACGTGGGCT 


625 


T*"\ T — 1 /—I T T T m T~\ rt rt rt 

DFSHLTR 83 0 


RSDHLTR 103 5 


DKGNLQT 124 0 


472 


c: n a 
b / b 


GAGGGTGTTG 


626 


\T rt "J— \ f~T> T 7V T*> ("> *"i 1 

NSDTLAR 831 


r-r-l j — N T T T mT~t -1 rt rt 

TSGHLTR 103 6 


RSDNLTR 1241 


200 


rr <-7 i—j 

57 7 


GGAGCGGGGA 


627 


T**» ^"1 T~\T TT rt T™N rt — \ 

RSDHLSR 832 


RSDELQR 103 7 


QSDHLTR 1242 


200 


579 


GGGGTTGAGG 


628 


RSDNLTR 83 3 


NRDTLAR 103 8 


TSGHLTR 1243 


200 


580 


GGTGTTGGAG 


629 


QRAHLAR 834 


NRDTLAR 103 9 


TSGHLTR 1244 


1000 


581 


TACGTGGGTT 


630 


QSSHLTR 835 


RSDSLLR 104 0 


DKGNLQT 124 5 


382 


583 


GTAGGGGTTG 


631 


NSSALTR 83 6 


RSDHLTR 1041 


QSASLTR 124 6 


46 


584 


GAAGGCGGAG 


632 


QAGHLTR 83 7 


DKSHLTR 104 2 


QSGNLTR 124 7 


1000 


585 


GAAGGCGGAG 


633 


QAGHLTR 83 8 


DSGHLTR 104 3 


QSGNLTR 124 8 


1000 



31 



587 


GGGGGTTACG 


634 


DKGNLQT 


839 


TSGHLTR 


1044 


RSDHLSK 


1249 


500 


588 


GGGGGGGGGG 


635 


RSDHLSR 


840 


RSDHLTR 


1045 


RSDHLSK 


1250 


3 0 


589 


GGAGTATGCT 


636 


DSGHLAS 


841 


QSATLAR 


1046 


QSDHLTR 


12 51 


10 00 


595 


TGGTTGGTAT 


637 


QRGSLAR 


842 


RGDALTR 


104 7 


"n (*ir\TTT mm 

RSDHLTT 


12 52 


73 . 3 


597 


TGGTTGGTA 


638 


QNSAMRK 


843 


RGDALTS 


1048 


RSDHLTT 


1253 


1000 


598 


TGGTTGGTA 


639 


QRGSLAR 


844 


RDGSLTS 


1049 


/"1T"*\TTX' Hi 1 n 1 

RSDHLTT 


1254 


1000 


599 


TGGTTGGTA 


640 


QNSAMRK 


845 


RDGSLTS 


1050 


^\ t t T" mm 

RSDHLTT 


1255 


1000 


600 


GAGTCGGAA 


641 


QSANLAR 


846 


RSDELRT 


1051 


RSDNLAR 


1256 


206.7 


601 


GAGTCGGAA 


642 


RSANLTR 


847 


RLDGLRT 


1052 


RSDNLAR 


1257 


606.7 


602 


GAGTCGGAA 


643 


RSANLTR 


848 


RQDTLVG 


1053 


RSDNLAR 


1258 


616 . 7 


603 


GAGTCGGAA 


644 


QSGNLAR 


849 


RSDELRT 


1054 


RSDNLAR 


1259 


166 . 7 


606 


GGGGAGGATC 


645 


TTSNLRR 


850 


RSDNLQR 


1055 


RSDHLSR 


1260 


0.2 




32 

TABLE 3 







SEQ 






CCf) 




Kd 


SBS# 


TARGET 


ID 


r 1 




jv Z 1JLJ 




( nM) 

\ X IX 1 / 


/ 


GAGGAGGTGA 12 61 


p cnsT &p 


1 7 A 7 


rv 0 LJV4 i_ i^irv x ft 3 3 


rv O X>1\ Xj V I\. X 3 X 


0 0 7 


0Z0 


GCGGAGGACC 


1262 




1 7 zl ft 


PQFlNTT.aP 1474. 
rv 0 J_J1N i_i.r-irv 11 ji 


PQDPPTCP 1 con 
rv o J_J i_i rv rviv X3^u 


0 1 


OO'i 


GAGGAGGTGA 12 63 




1 7 4 Q 


I\OJJlM J_lM.I\. X ft j 3 


rv o j_Ji>j xj v rv x 3 ^ _l 


n i r 


O X / 


GAGGAGGTGA 12 64 




1 7 <=; n 

X 3 3 W 


IV O J— /IN J_J -rt.lv _L 0 


IV O J—zIn JUirirv X 3 ^ 


n 7i 

w * -J J- 


£ £ £ 

ODD 


GCGGAGGCGC 


1265 


P QT)"nT.TP 

-TV 0JJJJL1 X JA. 


X J 3 X 


rv 0 jjin J_j j. rv x *± 3 / 


rv O LJ X XJXvXN. X 3 -J 




0 ^ y 


GCGGAGGACC 


1266 


TT>T^"7\ -NTT TP 

ill J\/\i\J Xj ± rC 


1 O CO 


P CPlMT a P 1 A 7 ft 
rv 0 JJ1\ J-jj-irv X ft 3 o 


P QFiTT T^K" 1 c;9d 
RaJJlJu rv xv 1 3 / 4 


n R9 


D / U 


GACGTGGAGG 


1267 


P QFYNTT I\D 
rv 0 U1NJ Xi/\rC 


1 *3 cr "3 
X j 3 3 




TIP QNTT TP 1 COC 
LJrvolNJ Xj X rv 1 DZ j 


n r *7 

U . 3 / 


O U X 


AAGGAGTCGC 


1268 


P Q 7\ pit PT 


X j 3 1 


PQDKTT.aP ~\ AAC\ 

IviZ> l_Jx.N 1 iM r\ X ft ft U 


IV O JJJ.N Ul^ 1 JZ D 


0 ft R 
U . O J 


D D O 


GTGGAGGCCA 


1269 


rLirvvj 1 Xii-irC 


IjDj 


P QTYNTT A P 1 ZL Zl 1 
rvo LJIN J_L/-irv X ft ft X 


Iv O LJJ-\LjJr\rL 1DZ / 


1 17 

X . X 3 


O J7 3 


ATGGATTCAG 


1270 


^oriuXi l iv 


X 3 3 O 


TCPIVTT VP 1 A A 1 
L O vjl\J Xj V rv X ft ft 


PQFlZiT TO 1 R9fi 


1 zl 

X . ft 


/ ZPZP 


GGGGGAGCTG 


1271 


nc cm op 


1 *) C7 
IjD / 


OP A T-IT "CPP 1 /L A "3 




1 PR 

X . O 3 


TOO 


GGGGGAGCTG 


1272 


CRT OP 
^ 0 0 J_J Xj K. 


1 7 ft 
X .3 3 O 


ncfTJT OP 1 AAA 
^ovjriXi^rc x ft ft ft 


PQDWT CP i cTf\ 


3 


O ft Z 


GAGGTGGGCT 


1273 


"HP CUT TP 


1 "3 R Q 
X 3 3 -7 


KnNM 1 iA K J_ ft ft 3 


P C T\ TVTT a P 1 R'JI 
rv o l_i/\rv X 3 ,5 X 


R A 
3 . ft 


O 27 ft 


TCAGTGGTAT 


1274 


HD C7\ T.A p 


X j D \J 


PQTiZiT QP 1 AAfc 

IvaUriJjDK. X ft ft D 


^OxxIJx_i _L rv X 3 -j ^ 


^ 1 ^ 
D . X 3 


O ¥ A 


ATGGATTCAG 


1275 


ACUTlT TV 
yoniJX 1 rv 


1 ^ £T 1 


OOCTtfT VP 1 A A H 

^^orviXjvrv x ft ft / 


PCRAT TO 1 C "5 


O . ^ 


QOQ 
OOO 


TCAGTGGTAT 


1276 


OQCCT VP 


1 'X £1 9 
1 J DZ 


PQFiZiT QP 1 AAQ 
rv o u/\Xj o rv x ft ft o 


yonuij ± rv x 3 3 ft 


1 A 

X f± 


/ J y 


GCGGGCGGGC 


1277 


P CHUT TP 


X J b 3 


DppUT TP 1 A A Q 
rirvvjrIXj 1 rv Xftft;? 


P CPiTiT PP 1 c: o cr 
rCo J_JxJljrvi\. 1 j jj 


X D . 3 




CAGGCTGTGG 


1278 


P QTl2\ T TP 


1 7 ^ A 
X 3 O ft 


OQQFiT TP 1 A n 
lyiooUL ± rv Xft3U 


i\oJJl\ xjrvJZj x 3 J O 


1 7 
x / 


TOT 
/ 17 / 


GCAGAGGCTG 


1279 


nQCfiT OP 


1 J U J 


P Q TTMT A P 1 AC1 
rv o U1M Xji-irv x ft 3 X 


yovjL/Jj ± rv x 3 3 / 


17 R 

X / . 3 


QQ1 

0 y X 


TCAGTGGTAT 


1280 




1 "3 £T £T 


DCriAT CD 1 A en 


ococtpt i c; ^ Q 

l^obo -UK. 1 X330 


1ft ^ 
X O . 3 


887 


TCAGTGGTAT 


1281 


QRSAIAR 


1367 


RSDALSR 14 53 


QSGDLRT 153 9 


23 . 75 


672 


TCGGACGTGG 12 82 


RSDALAR 


1368 


DRSNLTR 14 54 


RSDELRT 154 0 


24 


836 


GGGGAGGCCC 


1283 


ERGTLAR 


1369 


RSDNLAR 14 5 5 


RSDHLSR 1541 


24 .25 


674 


GCGGCGTCGG 12 84 


RSDELRT 


1370 


RADTLRR 14 5 6 


RSDTLKK 154 2 


27 . 5 


849 


GGGGCCCTGG 12 8 5 


RSDALRE 


1371 


DRSSLTR 1457 


RSDHLTQ 1543 


29 . 05 


825 


GAATGGGCAG 12 8 6 


QSGSLTR 


1372 


RSDHLTT 14 5 8 


QSGNLTR 1544 


37.3 


673 


GCGGGTGTCT 


1287 


DRSALAR 


1373 
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20 
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0 . 7 
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ATGGAAGAT 
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0 . 
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14 91 AAGGGTTCA 2503 NKTDLGK 3 003 DSSKLSR 3503 RLDNRTA 4 0 03 4.12 

14 92 AAGTGGTAG 2504 RSDNLTT 3 0 04 RSDHLTT 3 5 04 RSDNLTQ 4 0 04 1.37 

14 93 AAGTGGTAG 2505 RSDNLTT 3 0 0 5 RSDHLTT 3 50 5 RLDNRTQ 4 0 05 15.09 

1494 GGGTTTGAC 2506 DRSNLTR 3 0 0 6 QRSALAS 3 506 RSDHLSR 4 0 0 6 0.255 

14 96 TTGGGGGAG 2507 RSDNLAR 3 007 RSDHLTR 3 5 07 RSDALTT 4 0 0 7 0.065 

14 97 GAGGCTCTT 2508 QSSALAR 3 008 QSSDLTR 3508 RSDNLAR 4008 0.007 

14 98 GAGGTTGAT 2509 QSSNLAR 3 00 9 QSSALTR 3 50 9 RSDNLAR 4009 0.101 

14 9 9 GAGGTTGAT 2510 QSSNLAR 3010 TSGALTR 3 510 RSDNLAR 4010 0.02 

150 0 GCAGAGGAA 2511 QSGNLAR 3 011 RSDNLAR 3 511 QSGSLTR 4011 0.003 

1522 GCAATGGGT 2512 TSGHLVR 3 012 RSDALTQ 3 512 QSGDLTR 4 012 0.08 



